Search CORE

2 research outputs found

Sort vs. Hash Join Revisited for Near-Memory Execution

Author: Falsafi Babak
Grot Boris
Kocberber Onur
Mirzadeh Nooshin S.
Publication venue
Publication date: 01/01/2015
Field of study

Data movement between memory and CPU is a well-known energy bottleneck for analytics. Near-Memory Processing (NMP) is a promising approach for eliminating this bottleneck by shifting the bulk of the computation toward memory arrays in emerging stacked DRAM chips. Recent work in this space has been limited to regular computations that can be localized to a single DRAM partition. This paper examines a Join workload, which is fundamental to analytics and is characterized by irregular memory access patterns. We consider several join algorithms and show that while near-data execution can improve both energy-efficiency and performance, effective NMP algorithms must consider locality, access granularity, and microarchitecture of the stacked memory devices

Infoscience - École polytechnique fédérale de Lausanne

Edinburgh Research Explorer

The Mondrian Data Engine

Author: Daglis Alexandros
Drumond Mario
Falsafi Babak
Grot Boris
Mirzadeh Nooshin S.
Picorel Javier
Pnevmatikatos Dionisios N.
Ustiugov Dmitrii
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2017
Field of study

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, advancing compute capabilities necessitates novel architectural approaches. Near-memory processing (NMP) architectures are reemerging as promising candidates to improve computing efficiency through tight coupling of logic and memory. NMP architectures are especially fitting for data analytics, as they provide immense bandwidth to memory-resident data and dramatically reduce data movement, the main source of energy consumption. Modern data analytics operators are optimized for CPU execution and hence rely on large caches and employ random memory accesses. In the context of NMP, such random accesses result in wasteful DRAM row buffer activations that account for a significant fraction of the total memory access energy. In addition, utilizing NMP’s ample bandwidth with fine-grained random accesses requires complex hardware that cannot be accommodated under NMP’s tight area and power constraints. Our thesis is that efficient NMP calls for an algorithm-hardware co-design that favors algorithms with sequential accesses to enable simple hardware that accesses memory in streams. We introduce an instance of such a co-designed NMP architecture for data analytics, the Mondrian Data Engine. Compared to a CPU-centric and a baseline NMP system, the Mondrian Data Engine improves the performance of basic data analytics operators by up to 49× and 5×, and efficiency by up to 28× and 5×, respectively

Infoscience - École polytechnique fédérale de Lausanne

Edinburgh Research Explorer